AITopics | dysarthric speech

Collaborating Authors

dysarthric speech

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Aligning ASR Evaluation with Human and LLM Judgments: Intelligibility Metrics Using Phonetic, Semantic, and NLI Approaches

Phukon, Bornali, Zheng, Xiuwen, Hasegawa-Johnson, Mark

arXiv.org Artificial IntelligenceDec-12-2025

Traditional ASR metrics like WER and CER fail to capture intelligibility, especially for dysarthric and dysphonic speech, where semantic alignment matters more than exact word matches. ASR systems struggle with these speech types, often producing errors like phoneme repetitions and imprecise consonants, yet the meaning remains clear to human listeners. We identify two key challenges: (1) Existing metrics do not adequately reflect intelligibility, and (2) while LLMs can refine ASR output, their effectiveness in correcting ASR transcripts of dysarthric speech remains underexplored. To address this, we propose a novel metric integrating Natural Language Inference (NLI) scores, semantic similarity, and phonetic similarity. Our ASR evaluation metric achieves a 0.890 correlation with human judgments on Speech Accessibility Project data, surpassing traditional methods and emphasizing the need to prioritize intelligibility over error-based measures.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.16528

Country: North America > United States > Illinois (0.16)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Bridging the Perceptual-Statistical Gap in Dysarthria Assessment: Why Machine Learning Still Falls Short

Gurugubelli, Krishna

arXiv.org Artificial IntelligenceOct-28-2025

ABSTRACT Automated dysarthria detection and severity assessment from speech have attracted significant research attention due to their potential clinical impact. Despite rapid progress in acoustic modeling and deep learning, models still fall short of human expert performance. This manuscript provides a comprehensive analysis of the reasons behind this gap, emphasizing a conceptual divergence we term the "perceptual-statistical gap". We detail human expert perceptual processes, survey machine learning representations and methods, review existing literature on feature sets and modeling strategies, and present a theoretical analysis of limits imposed by label noise and inter-rater variability. We further outline practical strategies to narrow the gap, perceptually motivated features, self-supervised pre-training, ASR-informed objectives, multimodal fusion, human-in-the-loop training, and explainability methods. Finally, we propose experimental protocols and evaluation metrics aligned with clinical goals to guide future research toward clinically reliable and interpretable dysarthria assessment tools. Index T erms-- Dysarthria assessment, speech intelligibility, perceptual modeling, machine learning, human-AI gap, explainable AI, self-supervised learning 1. INTRODUCTION Dysarthria comprises a set of motor speech disorders resulting from neurological impairment such as Parkinson's disease, amyotrophic lateral sclerosis (ALS), stroke, or cerebral palsy that affect speech motor control and coordination [1].

artificial intelligence, assessment, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.22237

Country:

Asia > Middle East > Jordan (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (0.69)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages

Li, Chin-Jou, Yeo, Eunjung, Choi, Kwanghee, Pérez-Toro, Paula Andrea, Someki, Masao, Das, Rohan Kumar, Yue, Zhengjun, Orozco-Arroyave, Juan Rafael, Nöth, Elmar, Mortensen, David R.

arXiv.org Artificial IntelligenceSep-29-2025

Automatic speech recognition (ASR) for dysarthric speech remains challenging due to data scarcity, particularly in non-English languages. To address this, we fine-tune a voice conversion model on English dysarthric speech (UASpeech) to encode both speaker characteristics and prosodic distortions, then apply it to convert healthy non-English speech (FLEURS) into non-English dysarthric-like speech. The generated data is then used to fine-tune a multilingual ASR model, Massively Multilingual Speech (MMS), for improved dysarthric speech recognition. Evaluation on PC-GIT A (Spanish), EasyCall (Italian), and SSNCE (Tamil) demonstrates that VC with both speaker and prosody conversion significantly outperforms the off-the-shelf MMS performance and conventional augmentation techniques such as speed and tempo perturbation. Objective and subjective analyses of the generated data further confirm that the generated speech simulates dysarthric characteristics.

artificial intelligence, speech, speech recognition, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2025-512

2505.14874

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Can we reconstruct a dysarthric voice with the large speech model Parler TTS?

Sanchez, Ariadna, King, Simon

arXiv.org Artificial IntelligenceSep-26-2025

Speech disorders can make communication hard or even impossible for those who develop them. Personalised Text-to-Speech is an attractive option as a communication aid. We attempt voice reconstruction using a large speech model, with which we generate an approximation of a dysarthric speaker's voice prior to the onset of their condition. In particular, we investigate whether a state-of-the-art large speech model, Parler TTS, can generate intelligible speech while maintaining speaker identity. We curate a dataset and annotate it with relevant speaker and intelligibility information, and use this to fine-tune the model. Our results show that the model can indeed learn to generate from the distribution of this challenging data, but struggles to control intelligibility and to maintain consistent speaker identity. We propose future directions to improve controllability of this class of model, for the voice reconstruction task.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2025-2679

2506.04397

Genre: Research Report > New Finding (0.54)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.94)

Add feedback

Finding My Voice: Generative Reconstruction of Disordered Speech for Automated Clinical Evaluation

Rosero, Karen, Yeo, Eunjung, Mortensen, David R., Slot, Cortney Van't, Hallac, Rami R., Busso, Carlos

arXiv.org Artificial IntelligenceSep-24-2025

ABSTRACT We present ChiReSSD, a speech reconstruction framework that preserves children speaker's identity while suppressing mispronunciations. Unlike prior approaches trained on healthy adult speech, ChiReSSD adapts to the voices of children with speech sound disorders (SSD), with particular emphasis on pitch and prosody. We evaluate our method on the ST AR dataset and report substantial improvements in lexical accuracy and speaker identity preservation. Furthermore, we automatically predict the phonetic content in the original and reconstructed pairs, where the proportion of corrected consonants is comparable to the percentage of correct consonants (PCC), a clinical speech assessment metric. Our experiments show Pearson correlation of ρ = 0.63 between automatic and human expert annotations, highlighting the potential to reduce the manual transcription burden. In addition, experiments on the TORGO dataset demonstrate effective generalization for reconstructing adult dysarthric speech. Our results indicate that disentangled, style-based TTS reconstruction can provide identity-preserving speech across diverse clinical populations.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.19231

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Texas > Shelby County > Center (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback

Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS

Anuprabha, M, Gurugubelli, Krishna, Vuppala, Anil Kumar

arXiv.org Artificial IntelligenceAug-18-2025

Dysarthric speech poses significant challenges in developing assistive technologies, primarily due to the limited availability of data. Recent advances in neural speech synthesis, especially zero-shot voice cloning, facilitate synthetic speech generation for data augmentation; however, they may introduce biases towards dysarthric speech. In this paper, we investigate the effectiveness of state-of-the-art F5-TTS in cloning dysarthric speech using TORGO dataset, focusing on intelligibility, speaker similarity, and prosody preservation. We also analyze potential biases using fairness metrics like Disparate Impact and Parity Difference to assess disparities across dysarthric severity levels. Results show that F5-TTS exhibits a strong bias toward speech intelligibility over speaker and prosody preservation in dysarthric speech synthesis. Insights from this study can help integrate fairness-aware dysarthric speech synthesis, fostering the advancement of more inclusive speech technologies.

artificial intelligence, machine learning, speech, (15 more...)

arXiv.org Artificial Intelligence

2508.05102

Country: Asia > India (0.28)

Genre: Research Report > New Finding (0.89)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.46)
Information Technology > Security & Privacy (0.36)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching

Das, Shoutrik, Singh, Nishant, Gangwar, Arjun, Umesh, S

arXiv.org Artificial IntelligenceJun-23-2025

Dysarthria is a neurological disorder that significantly impairs speech intelligibility, often rendering affected individuals unable to communicate effectively. This necessitates the development of robust dysarthric-to-regular speech conversion techniques. In this work, we investigate the utility and limitations of self-supervised learning (SSL) features and their quantized representations as an alternative to mel-spectrograms for speech generation. Additionally, we explore methods to mitigate speaker variability by generating clean speech in a single-speaker voice using features extracted from WavLM. To this end, we propose a fully non-autoregressive approach that leverages Conditional Flow Matching (CFM) with Diffusion Transformers to learn a direct mapping from dysarthric to clean speech. Our findings highlight the effectiveness of discrete acoustic units in improving intelligibility while achieving faster convergence compared to traditional mel-spectrogram-based approaches.

artificial intelligence, machine learning, speech, (15 more...)

arXiv.org Artificial Intelligence

2506.16127

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > India (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech

Hajal, Karl El, Hermann, Enno, Hovsepyan, Sevada, -Doss, Mathew Magimai.

arXiv.org Artificial IntelligenceJun-3-2025

Automatic speech recognition (ASR) systems struggle with dysarthric speech due to high inter-speaker variability and slow speaking rates. To address this, we explore dysarthric-to-healthy speech conversion for improved ASR performance. Our approach extends the Rhythm and Voice (RnV) conversion framework by introducing a syllable-based rhythm modeling method suited for dysarthric speech. We assess its impact on ASR by training LF-MMI models and fine-tuning Whisper on converted speech. Experiments on the Torgo corpus reveal that LF-MMI achieves significant word error rate reductions, especially for more severe cases of dysarthria, while fine-tuning Whisper on converted data has minimal effect on its performance. These results highlight the potential of unsupervised rhythm and voice conversion for dysarthric ASR. Code available at: https://github.com/idiap/RnV

artificial intelligence, machine learning, speech, (17 more...)

arXiv.org Artificial Intelligence

2506.01618

Country: Europe > Switzerland > Vaud > Lausanne (0.05)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Exploring Generative Error Correction for Dysarthric Speech Recognition

La Quatra, Moreno, Koudounas, Alkis, Salerno, Valerio Mario, Siniscalchi, Sabato Marco

arXiv.org Artificial IntelligenceMay-27-2025

Despite the remarkable progress in end-to-end Automatic Speech Recognition (ASR) engines, accurately transcribing dysarthric speech remains a major challenge. In this work, we proposed a two-stage framework for the Speech Accessibility Project Challenge at INTERSPEECH 2025, which combines cutting-edge speech recognition models with LLM-based generative error correction (GER). We assess different configurations of model scales and training strategies, incorporating specific hypothesis selection to improve transcription accuracy. Experiments on the Speech Accessibility Project dataset demonstrate the strength of our approach on structured and spontaneous speech, while highlighting challenges in single-word recognition.

artificial intelligence, hypothesis, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.20163

Country:

Europe > Italy > Sicily > Palermo (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)

Genre: Research Report (0.51)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Potential Applications of Artificial Intelligence for Cross-language Intelligibility Assessment of Dysarthric Speech

Yeo, Eunjung, Liss, Julie, Berisha, Visar, Mortensen, David

arXiv.org Artificial IntelligenceFeb-4-2025

Purpose: This commentary introduces how artificial intelligence (AI) can be leveraged to advance cross-language intelligibility assessment of dysarthric speech. Method: We propose a conceptual framework consisting of a universal model that captures language-universal speech impairments and a language-specific intelligibility model that incorporates linguistic nuances. Additionally, we identify key barriers to cross-language intelligibility assessment, including data scarcity, annotation complexity, and limited linguistic insights, and present AI-driven solutions to overcome these challenges. Conclusion: Advances in AI offer transformative opportunities to enhance cross-language intelligibility assessment for dysarthric speech by balancing scalability across languages and adaptability by languages.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2501.15858

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York (0.04)
North America > United States > Arizona (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback